Beginning Data Science With R by Manas A. Pathak

Beginning Data Science With R by Manas A. Pathak

Author:Manas A. Pathak [Pathak, Manas A.]
Language: eng
Format: epub, pdf
Tags: Applied, Technology & Engineering, Computers, Mathematical & Statistical Software, General, Imaging Systems, Mathematics, Electronics
ISBN: 9783319120669
Google: RWHEBQAAQBAJ
Publisher: Springer
Published: 2014-12-08T20:54:29+00:00


Fig. 5.3Box plot of BIRTHS2010 and DEATHS2010 for micropolitan statistical areas

> boxplot(data.micro$BIRTHS2010,data.micro$DEATHS2010, names=c(’BIRTHS2010’,’DEATHS2010’))

We do not need to set show.names=T when we are calling box plot for multiple variables. We see that the top whisker and the extremities for BIRTHS2010 is longer than that for DEATHS2010. This implies that the BIRTHS2010 variable is more spread out than DEATHS2010. We also see that all five statistics for BIRTHS2010 is higher than that of DEATHS2010. This implies that the the BIRTH2010 values are overall higher than DEATH2010 values.

The boxplot() function can also compare all variables in the data frame together when called with the data frame boxplot(data.micro). On the other hand, we can also compute box plots for one variable over data split across another variable. In our original data frame, the LSAD variable denotes if the entry corresponds to a county or equivalent, metropolitan area, metropolitan statistical area, or micropolitan statistical area. We compute the box plot for BIRTHS2010 over the data partition below. The output for this function is shown in Fig. 5.4.

> boxplot(data$BIRTHS2010 ˜ data$LSAD)

Fig. 5.4Box plot of BIRTHS2010 for all areas



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.